Skip to main content

TritonGPUConfig()

Generates a configuration for running Towhee pipelines with a Triton inference server on GPU devices. See Towhee Pipeline in Triton for details.

TritonGPUConfig(device_ids=[0], max_batch_size=None, batch_latency_micros=None, preferred_batch_size=None)

Parameters

  • device_ids - list[int]

    • A list of GPU IDs.

    • The value defaults to [0], indicating that only GPU 0 is used.

  • max_batch_size - int or None

    • A maximum batch size that the model in the pipeline supports for the types of batching that can be exploited by Triton. See Maximum Batch Size for details.

    • The value defaults to None, leaving Triton to generate the value.

  • batch_latency_micros - int or None

    • Latency for Triton to process the delivered batch, in microseconds.

    • The value defaults to None, leaving Triton to generate the value.

  • preferred_batch_size - list[int] None

    • A list of batch sizes that the Triton should attempt to create.

    • The value defaults to None, leaving Triton to generate the value.

Returns

A TowheeConfig object with server set to a dictionary. The dictionary contains the specified parameters and their values with device_ids set to None.

Example

from towhee import pipe, ops, AutoConfig

auto_config1 = AutoConfig.TritonGPUConfig()
auto_config1.config # return {'server': {'device_ids': [0], 'num_instances_per_device': 1, 'max_batch_size': None, 'batch_latency_micros': None, 'triton': {'preferred_batch_size': None}}}

# or you can also set the configuration
auto_config2 = AutoConfig.TritonGPUConfig(device_ids=[0],
num_instances_per_device=3,
max_batch_size=128,
batch_latency_micros=100000,
preferred_batch_size=[8, 16])
auto_config2.config # return {'server': {'device_ids': [0], 'num_instances_per_device': 3, 'max_batch_size': 128, 'batch_latency_micros': 100000, 'triton': {'preferred_batch_size': [8, 16]}}}

# you can also add the configuration
auto_config3 = AutoConfig.LocalGPUConfig() + AutoConfig.TritonGPUConfig()
auto_config3.config # return {'device': 0, 'server': {'device_ids': [0], 'num_instances_per_device': 1, 'max_batch_size': None, 'batch_latency_micros': None, 'triton': {'preferred_batch_size': None}}}